Supersparse Linear Integer Models for Predictive Scoring Systems
نویسندگان
چکیده
Scoring systems are classification models that make predictions using a sparse linear combination of variables with integer coefficients. Such systems are frequently used in medicine because they are interpretable; that is, they only require users to add, subtract and multiply a few meaningful numbers in order to make a prediction. See, for instance, these commonly used scoring systems: (Gage et al. 2001; Le Gall et al. 1984; Le Gall, Lemeshow, and Saulnier 1993; Knaus et al. 1985). Scoring systems strike a delicate balance between accuracy and interpretability that is difficult to replicate with existing machine learning algorithms. Current linear methods such as the lasso, elastic net and LARS are not designed to create scoring systems, since regularization is primarily used to improve accuracy as opposed to sparsity and interpretability (Tibshirani 1996; Zou and Hastie 2005; Efron et al. 2004). These methods can produce very sparse models through heavy regularization or feature selection methods (Guyon and Elisseeff 2003); however, feature selection often relies on greedy optimization and cannot guarantee an optimal balance between sparsity and accuracy. Moreover, the interpretability of scoring systems requires integer coefficients, which these methods do not produce. Existing approaches to interpretable modeling include decision trees and lists (Rüping 2006; Quinlan 1986; Rivest 1987; Letham et al. 2013). We introduce a formal approach for creating scoring systems, called Supersparse Linear Integer Models (SLIM). SLIM produces scoring systems that are accurate and interpretable using a mixed-integer program (MIP) whose objective penalizes the training error, L0-norm and L1-norm of its coefficients. SLIM can create scoring systems for datasets with thousands of training examples and tens to hundreds of features larger than the sizes of most studies in medicine, where scoring systems are often used.
منابع مشابه
Interpretable Classification Models for Recidivism Prediction
We investigate a long-debated question, which is how to create predictive models of recidivism that are sufficiently accurate, transparent, and interpretable to use for decision-making. This question is complicated as these models are used to support different decisions, from sentencing, to determining release on probation, to allocating preventative social services. Each case might have an obj...
متن کاملWhich Methodology is Better for Combining Linear and Nonlinear Models for Time Series Forecasting?
Both theoretical and empirical findings have suggested that combining different models can be an effective way to improve the predictive performance of each individual model. It is especially occurred when the models in the ensemble are quite different. Hybrid techniques that decompose a time series into its linear and nonlinear components are one of the most important kinds of the hybrid model...
متن کاملEffect of education on the knowledge and attitude of intensive care unit staff towards the use of predictive disease severity scoring systems
Background and Purpose: Severity of illness scoring systems is used for the classification of patients to receive medical services, predict the risk of mortality, determine hospital bed occupancy, and assess treatment progress. In Iran, these scoring systems are not frequently used due to the lack of knowledge of medical staff. This study aimed to evaluate the effect of education on the knowl...
متن کاملA comparison of CRIB, CRIB II, SNAP, SNAPII and SNAP-PE scores for prediction of mortality in critically ill neonates
Abstract Background: Clinical Risk Index of Babies (CRIB), Score for Neonatal Acute Physiology (SNAP), an update of the Clinical Risk Index for Babies score (CRIB II) and Score for Neonatal Acute Physiology - Perinatal Extension (SNAP-PE) are scoring devices developed in neonatal intensive care units. This study reviewed these scoring systems in critically ill neonates to determine how well th...
متن کاملFuzzy Reliability Optimization Models for Redundant Systems
In this paper, a special class of redundancy optimization problem with fuzzy random variables is presented. In this model, fuzzy random lifetimes are considered as basic parameters and the Er-expected of system lifetime is used as a major type of system performance. Then a redundancy optimization problem is formulated as a binary integer programming model. Furthermore, illustrative numerical ex...
متن کامل